Go to W3Schools!

Motivation

Never before in history, have there been so many people on Earth as right now. The number boosted in the years, from around 1 billion in the year 1800, to 7.5 billions in 2017.

Estimates of the population amount at earlier times have been done too: at the time agriculture emerged in around 10000 Before Christ, the world population ranged between 1 million and 15 million. Even earlier - about 70000 years ago - studies supports that humans may have gone through bottleneck of 1000 - 10000 people according to the thory of the Toba supervulcanic eruption\(^{[1]}\).

Given the population growth of the last century, what should we expect for the next one? Will this lead to major changes in our lifestyle, or will this lead to wars, poverty problems, lack of primary resources and so on?
Or maybe all those are just unwarrant fears and everything is going to fix itself?

The Data

Data Sources

For this study I joined various dataset.

I started my analysis combining some datasets I found on the World Bank Open Data at https://data.worldbank.org/, where downloaded the collections of data regarding population amount, birthrate and deathrate (both over 1000 people) and the Gross Domestic Product of a Country; those datasets contain values about the relative indicators from 1960 to 2016 for (quite) every Country in the world, and they show some missing data.

To analyze the situation in Italy also in earlier years (1700 - 1960) I added the data found here: https://www.populstat.info/Europe/italyc.htm.

To have an estimate of the world’s population from year one AD, I took also data from here: https://www.ecology.com/population-estimates-year-2050/.

Finally, I used a bit also the dataset “Countries of the world” that you can find on Kaggle at https://www.kaggle.com/fernandol/countries-of-the-world, which allowed me to know some charachteristics of the different countries, but of those I actually used only the Region (which will be definede later) and the Area.

Data Analysis

To analyze the data I made use of different R packages: dplyr, leaflet, ggplot2, tidyr are the main names, but I used also geojsonio, rworldmap and countrycode to parse the data, leaflet to create some plots, and htmlwidget and htmltools to save and plot some interactive maps.

Let’s start now to explore a bit our datasets.

Early Ages

First of all, let’s have a look to the first stages of the human growth, from year 1 A.D. to 1800 A.D. (ann.us domini)

To generate this plot, I just had to read a simple two-column table. I decided to use plotly, to have an interactive view of the data.
As we can see, there are a lower and an upper estimates for those values: in those ages, the world population starts to grow with a trend that seems quite exponential.

World Bank Data

Let’s take a look to the datasets of World Bank Open Data.
When one decides to download an indicator from this site, ends up with three files:

  • The first one is made only by a source note, which better describes the indicator: for “Total Population” we found : Total population is based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates.
  • In the second one we found a more precise descriptions for each Country.
  • The third one is the real dataset, which contains the following columns:
    • “Country.Name”
    • “Country.Code”
    • “Indicator.Name”
    • “Indicator.Code”
    • years from 1960 to 2017 in the form “X1960”

Indicator Name and Code are quite useless for our scope, beacuse they are only a skimpy description of the table content. Country name and code are, on the other end, essential: each row of the table contains all the data for one single Country, for that indicator, for the 1960 - 2017 time frame. In the columns with the years as names, are then contained the actual useful data.

I started with some data-parsing: I modified the year’s columns to eliminate that ‘X’ in fromt of each year, then I eliminated rows or coluns which contained only NA values (not the ones which presented some NA’s sometimes, to be clear).
After thet, I decided then to add to each Country the “Continent” and “Region” variables (the last one indicates in which part of the Continent the Country is located), to chech trend of the selected indicators as a dependence of those. To do this I had to modify the dataset adding some column by means of dlpyr and the mutate command. This analysis can be found into the “Data_Cleaning.R” script file: I used it to parse the data and then to save the cleaned dataset, to work on them.

I proceeded then checking the data about the World’s amount of population in different years, to verify if the outcome values were consistent with the well-known effective numbers: I used again dplyr’s commands to select and count data, doing something like:

total_population %>%
    select(Country, `1960`) %>%
    filter(Country=="World")

As you can see, also the “World” data is a row of our dataset.
I had to be careful, so: apart from “World” also other non-Country data were inserted as rows into the dataset. Indeed, when I did my first tests, things didn’t add up!

What is the trend for the world population in the last years? I decided to plot it, but to have a better analysis I plotit next to the Percentage of growth for every year of the whole world, computed as:

\[ GrowthPercentage = \frac{P_{t}-P{t_0}}{P_{t}} \cdot 100 \]

Where \(P_{t}\) represents the population in a certain year and \(P_{t_0}\) the population at the preceeding year.

With the image here below we can have a sight at the distribution of the people around the Continents and the different Regions in 2017.

Density Indicator

Given the “total population” dataset, I created a new one containing the density of people in the different Countries, to show it then on a map: to do so I keep the “Area” measure from the “countries_world” dataset took from Kaggle and I computed the density as the number of people over the area.

Obviously, I had to created a discrete scale of values to make a map work.

Leaflet

Let’s now have a look the the World’s situation. Here below you can see a leaflet representation of the distribution of people in the whole world: I created then the leaflet interactive map using the code you can find into the “leaflet_map.R” script. For it, I had to perform other data arrangements because some Country’s names where not exact for leaflet, so it did not show some data at all: for example, instead of “United States” it was expected “United States of America”, so I had to check manually for a lot of names.

Gross domestic product

The GDP is defined as “an aggregate measure of production equal to the sum of the gross values added of all resident and institutional units engaged in production (plus any taxes, and minus any subsidies, on products not included in the value of their outputs).” And is considered the “world’s most powerful statistical indicator of national development and progress”.

Demographic Transition

What happens when a poor country starts to walks throught welfare and moves to an industrialized economic system? The birth rate, which is usually high in a poor country, will no longer be compensated by the high death rate, and the population starts to grow. Then at a certain point, the fear about overpopulation starts to rise.< br> In 1929 the American demographer Warren Thompson developed the theory of the Demogrephic Transition[3], whereby happens a transition from high birth and death rates to lower birth and death rates.

This theory can involve four to five stages of transition of the trend of population growth. Here’s a summary of the five steps:

Let’s have a look at the situation in Italy, for example: in the graphs below you can see the trend of births, deaths, and the total population from 1960 to 2016.

Demographic Transition in Italy


To have a better look at the situation I searched for a dataset which included also some previous years: here below it is shown the trend od total people starting from 1700.


The green shaded are of the second graph includes the same area of the green-line graph above.

Italy is currently in Stage 4 of the Demographic Transition Model: as we can see from the above graphics, we are having low birth rates and low death rates; moreover the Population Growth Rate (PGR) is low, causing the stabilization of the people amount.

\[ PGR = \frac{P(t_2) - P(t_1)}{P(t_1)(t_2 - t_1)} \]

A positive outcome of the PGR indicates that the population is increasing, while a negative one indicates the decreasing of it. Moreover, a zero result means that the quantity has not changed in the selected amount of time.
Let’s compute it on the data used above here, using as time interval the years.

As we can see from the plot, the PGR values are quite low, and in the last years (2015 - 2016) starts also to become negative. This is a good proof that Italy’s population is starting to diminish, and so that Italy is currently standing into Stage Four of the Demographic Transiiton.





What are the Prospects?







Go to W3Schools!

Let’s try to predict what will be the World’s population in the next years. To do so, I will use the Logistic model for Population Growth, which can be described by the Pearl-Reed logistic equation:

\[ \frac{dN}{dt} = rN(1-\frac{N}{K}) \]

This formula is used to describe the self-limitations of growth of a biological population, and was first published (in a different form) in 1838 by Verhulst, who was a belgian mathematician and a statistician. Pearl and Reed popularized the equation in the twentieth century.

In the equation, N represents the number of individuals at time t, r the intrinsic growth rate and K the maximum number of individuals that the environment can support. It can be integrated, obtaining:

\[ N(t) = \frac{K N_0 e^{-rt}}{K + N_0(e^{-rt}-1)} \] Where \(N_0\) is the starting situation.

The main feature of the logistic model is that it takes the shape of a sigmoid curve and describes the growth of a population as an exponential followed by aa growth decrease, and bounded by the carrying capacity of the environment.

I used then the World Population data between 1960 and 2017 to try out this model and predict how much will the population be in 2100 (the code is inside the script prospects.R) ending with this graph:

References

[1] https://en.wikipedia.org/wiki/Toba_catastrophe_theory
[2] https://www.kaggle.com/fernandol/countries-of-the-world
[3] https://data.worldbank.org/
[4] https://www.ecology.com/population-estimates-year-2050/ (early ages)
[5] https://en.wikipedia.org/wiki/Demographic_transition
[6] https://en.wikipedia.org/wiki/Logistic_function
[7] https://en.wikipedia.org/wiki/Projections_of_population_growth
[7] http://www.clker.com/clipart-530947.html (clipart)